FIXES: Fix authentication, repo-stats API, and Discord bot startup issues#239
FIXES: Fix authentication, repo-stats API, and Discord bot startup issues#239vinay1359 wants to merge 7 commits intoAOSSIE-Org:mainfrom
Conversation
📝 WalkthroughWalkthroughAwait Supabase async auth call; add a cached, rate-limited Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Frontend Client
participant API as Backend (FastAPI)
participant Auth as Supabase Auth
participant Cache as In-Memory Cache
participant GitHub as GitHub REST API
Client->>API: POST /api/repo-stats { repo_url, Authorization }
activate API
API->>Auth: await supabase.auth.get_user(token)
Auth-->>API: user info / 401
alt authenticated
API->>Cache: get(key=owner/repo)
alt cache hit
Cache-->>API: cached repo data
API-->>Client: 200 OK + cached data
else cache miss
API->>GitHub: GET /repos/{owner}/{repo} (Authorization header)
GitHub-->>API: 200 / 404 / 403 / other
alt success
API->>Cache: set(key, result, ttl=300s)
API-->>Client: 200 OK + mapped repo data
else error
API-->>Client: appropriate HTTP error (404/403/500)
end
end
else auth failure
API-->>Client: 401 Unauthorized
end
deactivate API
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@backend/routes.py`:
- Around line 54-58: Before building the GitHub API request headers in
backend/routes.py, validate that settings.github_token is set (non-empty) and
handle the missing token like in backend/app/services/github/user/profiling.py:
check settings.github_token, and if absent raise or return a clear error (or log
and abort the request) with an explanatory message about the missing GitHub
token so you don’t construct an invalid Authorization header; update the code
path that constructs headers = {"Authorization": f"token
{settings.github_token}", ...} to only run after this validation.
- Around line 46-52: The repo_url parsing is brittle; update the logic around
request.repo_url (and the variables repo_url, parts, owner, repo) to normalize
and robustly handle SSH URLs, URLs without protocol, and .git suffixes: first
trim whitespace, handle SSH format starting with "git@" by splitting on ":" to
get the path, otherwise ensure a scheme (prepend "https://" if missing) and use
a URL parser to get the path, strip leading/trailing slashes and any trailing
".git", split the path into segments, validate there are at least two segments,
then assign owner = segments[0] and repo = segments[1]; raise the same
HTTPException on invalid inputs.
🧹 Nitpick comments (2)
backend/integrations/discord/cogs.py (1)
43-45: Consider usinglogger.debug()instead ofprint()for consistency.The file uses
loggerelsewhere (e.g., line 47). Usingprint()for tracing bypasses log configuration (levels, formatting, handlers) and may clutter stdout in production.♻️ Suggested refactor
async def cleanup_expired_tokens(self): """Periodic cleanup of expired verification tokens""" try: - print("--> Running token cleanup task...") + logger.debug("Running token cleanup task...") await cleanup_expired_tokens() - print("--> Token cleanup task finished.") + logger.debug("Token cleanup task finished.") except Exception as e: logger.error(f"Error during token cleanup: {e}") `@cleanup_expired_tokens.before_loop` async def before_cleanup(self): """Wait until the bot is ready before starting cleanup""" - print("--> Waiting for bot to be ready before starting cleanup task...") + logger.debug("Waiting for bot to be ready before starting cleanup task...") await self.bot.wait_until_ready() - print("--> Bot is ready, starting cleanup task...") + logger.debug("Bot is ready, starting cleanup task...")Also applies to: 52-54
backend/routes.py (1)
91-95: Improve exception handling per best practices.The static analysis hints are valid here. Consider:
- Using
logging.exceptionto capture stack traces- Using
raise ... from efor exception chaining- Avoiding exposing internal error details to clients
♻️ Suggested improvement
except HTTPException: raise except Exception as e: - logging.error(f"Error fetching repo stats: {e}") - raise HTTPException(status_code=500, detail=str(e)) + logging.exception("Error fetching repo stats") + raise HTTPException(status_code=500, detail="Failed to fetch repository statistics") from e
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
frontend/package-lock.jsonis excluded by!**/package-lock.jsonpoetry.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
backend/app/core/dependencies.pybackend/integrations/discord/cogs.pybackend/main.pybackend/routes.pypyproject.toml
🧰 Additional context used
🧬 Code graph analysis (1)
backend/integrations/discord/cogs.py (1)
backend/app/core/orchestration/queue_manager.py (1)
start(47-56)
🪛 Ruff (0.14.11)
backend/routes.py
50-50: Abstract raise to an inner function
(TRY301)
68-71: Abstract raise to an inner function
(TRY301)
93-93: Do not catch blind exception: Exception
(BLE001)
94-94: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
95-95: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🔇 Additional comments (4)
backend/integrations/discord/cogs.py (1)
35-37: LGTM! Good defensive check before cancellation.Checking
is_running()before callingcancel()prevents potential errors when the task hasn't started or has already completed.pyproject.toml (1)
11-11: Remove unusedpy-corddependency.Both
py-cordanddiscord-pyare listed inpyproject.toml(lines 11 and 29), but the codebase only imports fromdiscord(discord.py). Thepy-corddependency is unused and should be removed to avoid unnecessary bloat and potential namespace conflicts.backend/app/core/dependencies.py (1)
47-59: LGTM! Correct fix for the async authentication bug.The
awaitkeyword properly resolves the coroutine before accessinguser_response.user, fixing the "coroutine object has no attribute 'user'" error. Theget_supabase_client()function returns anAsyncClientinstance, andsupabase.auth.get_user(token)is correctly awaited. Error handling is well-structured.backend/main.py (1)
17-17: Router registration is correct with no route conflicts.The legacy router is properly imported and registered with the
/apiprefix. Routes are cleanly separated: legacy endpoints at/api/repo-statsand/api/github/webhook, while the main API routes remain under/v1/*.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@backend/routes.py`:
- Around line 124-128: The generic exception handler leaks internal details and
doesn't preserve traceback; replace the current block so that you call
logging.exception(...) to log the error with its traceback, return a generic
client-facing message (e.g., "Internal server error") instead of str(e) when
raising HTTPException, and re-raise using exception chaining (raise
HTTPException(status_code=500, detail="Internal server error") from e) so the
original exception context is preserved; update the except Exception as e: block
in backend/routes.py accordingly.
🧹 Nitpick comments (2)
backend/routes.py (2)
62-69: Unreachable exception handler.The
except (IndexError, ValueError)block at lines 68-69 is dead code. Since line 64 already checksif "github.com/" in repo_url, thesplit("github.com/", 1)[1]on line 65 will always succeed and return a string (never raiseIndexErrororValueError).♻️ Suggested simplification
# Extract path from URL - try: - # Parse URL to get path component - if "github.com/" in repo_url: - path = repo_url.split("github.com/", 1)[1] - else: - raise HTTPException(status_code=400, detail="Invalid GitHub repository URL") - except (IndexError, ValueError): + # Parse URL to get path component + if "github.com/" in repo_url: + path = repo_url.split("github.com/", 1)[1] + else: raise HTTPException(status_code=400, detail="Invalid GitHub repository URL")
79-97: Fix inconsistent indentation on comment lines.Lines 80 and 86 have extra leading whitespace before the comments, causing misalignment with the surrounding code. This appears to be a formatting error.
♻️ Suggested fix
owner, repo = segments[0], segments[1] - # Validate GitHub token is configured + + # Validate GitHub token is configured if not settings.github_token: raise HTTPException( status_code=500, detail="GitHub token not configured in environment variables. Please set GITHUB_TOKEN." ) - # Fetch repository stats from GitHub API + + # Fetch repository stats from GitHub API headers = {
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/routes.py
🧰 Additional context used
🪛 Ruff (0.14.11)
backend/routes.py
55-55: Abstract raise to an inner function
(TRY301)
69-69: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
77-77: Abstract raise to an inner function
(TRY301)
82-85: Abstract raise to an inner function
(TRY301)
101-104: Abstract raise to an inner function
(TRY301)
126-126: Do not catch blind exception: Exception
(BLE001)
127-127: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
128-128: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🔇 Additional comments (3)
backend/routes.py (3)
10-11: LGTM!The imports for
settingsandhttpxare appropriate for the new repo-stats endpoint functionality.
15-16: LGTM!The
RepoRequestmodel is simple and sufficient. URL validation is handled in the endpoint logic.
99-123: LGTM!The error handling now logs the full error details server-side while returning a sanitized message to clients, addressing the previous concern about exposing raw GitHub API responses. The response structure is well-organized.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@backend/routes.py`:
- Around line 68-69: The except block catching IndexError/ValueError should
preserve the original exception context: change it to "except (IndexError,
ValueError) as e:" and re-raise the HTTPException using exception chaining
(e.g., "raise HTTPException(status_code=400, detail='Invalid GitHub repository
URL') from e") so the original error is retained for debugging while still
returning the HTTPException to callers.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/routes.py
🧰 Additional context used
🪛 Ruff (0.14.11)
backend/routes.py
55-55: Abstract raise to an inner function
(TRY301)
69-69: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
77-77: Abstract raise to an inner function
(TRY301)
82-85: Abstract raise to an inner function
(TRY301)
101-104: Abstract raise to an inner function
(TRY301)
🔇 Additional comments (4)
backend/routes.py (4)
10-11: LGTM!Appropriate imports for the new functionality -
settingsfor configuration access andhttpxfor async HTTP requests.
15-16: LGTM!Simple and functional request model. URL validation is handled in the endpoint logic.
80-104: LGTM - Token validation and API error handling are well implemented.Good practices applied:
- Token validation before making the API request
- Error details logged server-side while returning a sanitized message to the client
- Reasonable timeout configured for the external call
Minor cosmetic note: Lines 80 and 86 have inconsistent comment indentation.
106-128: LGTM - Response structure and exception handling are well implemented.The response returns a clean, structured subset of repository data. The exception handling correctly:
- Re-raises
HTTPExceptionwithout modification- Uses
logging.exceptionfor automatic traceback inclusion- Returns a generic error message to clients
- Preserves exception context with
from e
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
|
Great work on fixing these critical issues! The fixes for authentication and Discord bot lifecycle are solid. However, before merge, the /api/repo-stats endpoint needs these architectural improvements: CRITICAL: Missing Rate Limiting & Caching Without these, a single attacker can exhaust your entire GitHub API quota (60 reqs/min). Implement: from slowapi import Limiter @router.post("/repo-stats") Security: Add Authentication The endpoint is currently public. Add @require_auth decorator and dependency injection: async def get_repo_stats( Exception Logging: Info Disclosure Risk Change logging.exception() to avoid logging full stacktraces: except Exception as e: URL Parsing: Add Validation Validate GitHub username/repo format: import re Discord Bot: Minor Improvement Wrap task cancellation in try/except to handle all edge cases: try: Severity: Rate limiting/caching = MUST FIX | Others = SHOULD FIX before merge Impact: Current code is functional but production-risky. These 5-10 minutes of fixes prevent API quota exhaustion and security vulnerabilities. |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@backend/app/core/cache.py`:
- Around line 45-83: The eviction logic in Cache.set uses int(self.max_size *
0.1) which becomes 0 for max_size < 10, allowing the cache to exceed capacity;
update the eviction calculation in set to compute an evict_count = max(1,
int(self.max_size * 0.1)) (or max(1, round(...)) as you prefer) and use
evict_count instead of int(self.max_size * 0.1) when slicing old_keys so that at
least one entry is removed when len(self._cache) >= self.max_size; keep the
removals within the existing with self._lock block and preserve the debug
message that reports len(old_keys).
In `@backend/main.py`:
- Around line 111-117: The RateLimitExceeded handler currently returns a plain
dict (causing a 200 response); change the handler to return a JSONResponse with
status_code=429 and the same payload. Replace the lambda passed to
api.add_exception_handler(RateLimitExceeded, ...) with a small callable that
constructs fastapi.responses.JSONResponse(content={"detail":"Rate limit
exceeded. Maximum 10 requests per minute per IP.","retry_after":60},
status_code=429) so requests caught by the limiter (get_limiter /
api.state.limiter) correctly return HTTP 429.
In `@backend/routes.py`:
- Around line 51-84: The route handler get_repo_stats currently uses a Pydantic
body param named request: RepoRequest which prevents the limiter from accessing
the HTTP Request; change the signature to accept request: Request (from fastapi)
as the first param and rename the body model to repo_request: RepoRequest, then
update all uses of request.repo_url to repo_request.repo_url inside
get_repo_stats; additionally remove the unused local call limiter =
get_limiter() since the global limiter/decorator is used.
- Around line 20-21: Current GITHUB_IDENTIFIER regex rejects dots in repository
names; create two regex constants GITHUB_OWNER = re.compile(r'^[a-zA-Z0-9_-]+$')
and GITHUB_REPO = re.compile(r'^[a-zA-Z0-9._-]+$') to reflect GitHub rules,
replace the existing GITHUB_IDENTIFIER with these two, and update the owner and
repo validations to use GITHUB_OWNER.match(owner) and GITHUB_REPO.match(repo)
respectively (look for the variables owner and repo and any usages of
GITHUB_IDENTIFIER.match).
🧹 Nitpick comments (1)
backend/routes.py (1)
64-64: Remove unused locallimiter.
Line 64 assignslimiter = get_limiter()but never uses it.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@backend/routes.py`:
- Around line 68-106: The repo URL handling is failing when URLs contain query
strings or fragments (e.g., ?tab= or `#readme`); modify the HTTP/HTTPS branch to
parse the URL with urllib.parse.urlparse and use the parsed.path (which excludes
query/fragment) to build path instead of splitting on "github.com/". After
obtaining path from urlparse, strip leading/trailing slashes and the ".git"
suffix (same as current removesuffix logic) before splitting into segments;
adjust references to repo_url, path, and segments in backend/routes.py
accordingly so query/fragment parts no longer produce invalid repo paths.
♻️ Duplicate comments (1)
backend/app/core/cache.py (1)
45-48: Consider a stronger hash than MD5 for cache keys.MD5 collisions are feasible; a stronger hash avoids accidental key collisions and addresses the S324 warning. This mirrors earlier guidance.
🔒 Suggested change
- return hashlib.md5(key_str.encode()).hexdigest() + return hashlib.sha256(key_str.encode()).hexdigest()
🧹 Nitpick comments (3)
backend/app/services/codegraph/repo_service.py (1)
18-23: Extract the long error message to satisfy Ruff TRY003.Consider moving the long message to a constant or custom exception class to keep the constructor lean and align with the linter.
♻️ Suggested tweak
- if not self.secret_token: - raise ValueError( - "SECRET_TOKEN environment variable must be set for CodeGraph backend authentication. " - "Please configure this in your .env file." - ) + if not self.secret_token: + raise ValueError(MISSING_SECRET_TOKEN_MESSAGE)MISSING_SECRET_TOKEN_MESSAGE = ( "SECRET_TOKEN environment variable must be set for CodeGraph backend authentication. " "Please configure this in your .env file." )backend/app/database/falkor/code-graph-backend/api/index.py (1)
74-77: Centralize repo-name validation to avoid drift.You’re repeating the same regex + inline import in three endpoints. Extract a module-level regex and a small helper to keep validation consistent and avoid copy/paste changes.
♻️ Suggested refactor
@@ -import os +import os +import re @@ +REPO_NAME_RE = re.compile(r'^[a-zA-Z0-9_\-\.]+$') + +def _is_valid_repo_name(repo: str) -> bool: + return bool(REPO_NAME_RE.match(str(repo))) @@ def graph_entities(): - import re - if not re.match(r'^[a-zA-Z0-9_\-\.]+$', str(repo)): + if not _is_valid_repo_name(repo): return jsonify({'status': 'Invalid repository name format'}), 400 @@ def auto_complete(): - import re - if not re.match(r'^[a-zA-Z0-9_\-\.]+$', str(repo)): + if not _is_valid_repo_name(repo): return jsonify({'status': 'Invalid repository name format'}), 400 @@ def find_paths(): - import re - if not re.match(r'^[a-zA-Z0-9_\-\.]+$', str(repo)): + if not _is_valid_repo_name(repo): return jsonify({'status': 'Invalid repository name format'}), 400Also applies to: 81-82, 170-173, 182-183, 287-290
backend/app/core/cache.py (1)
142-161: Preserve wrapped function metadata.Adding
functools.wrapskeeps names/docs intact for debugging and tooling.♻️ Suggested change
+from functools import wraps @@ def decorator(func: Callable) -> Callable: - async def wrapper(*args, **kwargs): + `@wraps`(func) + async def wrapper(*args, **kwargs): cache = get_cache()
|
Thanks for the review, really appreciate it. |
Closes #238
📝 Description
This PR fixes several bugs that prevented the application from running properly.
What was broken:
Authentication wasn't working
Users could not log in to the website because the code was attempting to access
useron a coroutine object instead of awaiting it first, resulting in authentication failures.Repo stats endpoint was missing
The
/api/repo-statsendpoint was defined inroutes.pybut was not registered with the main FastAPI application, causing the endpoint to return a 404 response.Discord bot crashed on startup
The Discord bot attempted to run a cleanup task before the bot was fully initialized, which caused a runtime error during startup.
🔧 Changes Made
What I fixed:
File: dependencies.py
Added
awaitbeforesupabase.auth.get_user(token)so the async call completes before accessing the user object.File: main.py
Imported the legacy router from
routes.pyand registered it with the FastAPI app usingapi.include_router(legacy_router, prefix="/api").This makes the repo stats endpoint available at
/api/repo-stats.File: routes.py
Added the missing
/repo-statsPOST endpoint implementation.The endpoint uses
httpxto call the GitHub API, fetches repository data (stars, forks, issues, etc.), parses the repository URL, and returns structured JSON.File: cogs.py
Improved the cleanup task to wait until the Discord bot is fully ready before running.
Added logging to indicate when the task is waiting versus running.
Fixed the
cog_unloadmethod to check whether the task is running before attempting to cancel it.Testing:
/api/repo-statsendpoint returns valid GitHub repository data.✅ Checklist
Summary by CodeRabbit
New Features
Bug Fixes
Chores
✏️ Tip: You can customize this high-level summary in your review settings.